OpenAI Request for Research

So far in this lesson, you have learned about many black-box optimization techniques for finding the optimal policy. Run each algorithm for many random seeds, to test stability.

Take the time now to implement some of them, and compare performance on
OpenAI Gym's CartPole-v0 environment.

Note: This suggested exercise is completely optional.

Once you have completed your analysis, you're encouraged to write up your own blog post that responds to OpenAI's Request for Research! (This request references policy gradient methods. You'll learn about policy gradient methods in the next lesson.)

Implement (vanilla) hill climbing and steepest ascent hill climbing, both with simulated annealing and adaptive noise scaling.

If you also want to compare the performance to evolution strategies, you can find a well-written implementation here. To see how to apply it to an OpenAI Gym task, check out this repository.

To see one way to structure your analysis, check out this blog post, along with the accompanying code.

For instance, you will likely find that hill climbing is very unstable, where the number of episodes that it takes to solve CartPole-v0 varies greatly with the random seed. (Check out the figure below!)

Histogram of number of episodes needed to solve CartPole with hill climbing. ([Source](http://kvfrans.com/simple-algoritms-for-solving-cartpole/)) — Histogram of number of episodes needed to solve CartPole with hill climbing. (Source)